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Abstract The biological world, especially its majority microbial component, 
is strongly interacting and may be dominated by collective effects. In this 
review, we provide a brief introduction for statistical physicists of the way 
in which living cells communicate genetically through transferred genes, as 
well as the ways in which they can reorganize their genomes in response to 
environmental pressure. We discuss how genome evolution can be thought of 
as related to the physical phenomenon of annealing, and describe the sense 
in which genomes can be said to exhibit an analogue of information entropy. 
As a direct application of these ideas, we analyze the variation with ocean 
depth of transposons in marine microbial genomes, predicting trends that 
are consistent with recent observations using metagenomic surveys. 
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1 Introduction 

The advent of high throughput sequencing technologies and their applica- 
tion to genomics and metagenomics provides an ever-growing torrent of data 
that is providing fine-grained data about ecosystems and is beginning to 
alter our views on biological evolution 37, 22 . In particular, it is becoming 
clear that the biological world, especially its majority microbial component, 
is strongly interacting and may be dominated by collective effects. Such phe- 
nomena can arise of course through cell-to-cell communication by signaling 
molecules [BH], but there is also a genetic mode of communication: genes can 
be transferred between living cells not related by heredity, and subsequently 
expressed. Comparative genomics indicate the widespread and frequent pres- 
ence of this so-called "horizontal gene transfer" (HGT) between organisms, 
not only those that are closely-related but also those that are distant taxa. 
Metagenomic surveys also quantify the unsuspected extent of the abundance 
of mobile genetic elements (MGEs) such as plasmids, viruses, and trans- 
posons, to a degree that exceeds organismal abundances by an order of mag- 
nitude in many different ecosystems [30 ,^2"5] . 

The basis of classic Neo-darwinist population genetics rests on the notion 
of gradual differences arising (i.e., via point mutation) within a population. 
Populations are defined by firm geological and species barriers that con- 
strain the flow of genetic information. By contrast, mobile genetic elements 
(MGEs) such as plasmids [76], viruses [68], and transposons [58] are capable 
of producing large genomic changes and can jump between (classically de- 
fined) populations by crossing species barriers. In particular, horizontal gene 
transfer — the transfer of genes between organisms that are not related by 
heredity — creates novel gene combinations by shuffling genetic material be- 
tween organisms that share the same environment. Such gene transfers can 
have an enormous impact on the process of biological evolution [4,5.7511511 
[331I3Q]. 

The growing realization of the widespread nature of horizontal gene trans- 
fer (HGT) is bringing about a re-evaluation of some of the fundamental 
notions in biology, such as the species concept [341.135] . Drawing general con- 
clusions about the frequency and effects of HGT requires massive amounts of 
genome data. Experimentally, advances in high-throughput sequencing have 
allowed researchers to sequence environmental samples identifying microbial 
taxa and gene compositions 52J. 

The lessons from statistical physics can be fruitfully translated to evolu- 
tionary biology. Methods from statistical physics often utilize an abundance 
of data in order to draw out general characteristics and conclusions about the 
central features that describe system behavior. Statistical approaches have 
already been implemented for many biological systems including metabolic 
networks [4T1183"] and ecology [4"4"ll24| . Statistical physics is potentially im- 
portant for understanding the emergent properties of dynamical biological 
systems, such as the characteristics and key features of evolution with HGT. 
Collective phenomena such as synchronization in fireflies |62j or pattern for- 
mation in predator-prey dynamics [14] rely on techniques for studying many 
individual trajectories in order to arrive at a condensed, more principled, 
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understanding of system behavior. HGT can be thought of as an interaction 
that couples the many potentially different genomes in a population of organ- 
isms. Indeed, cellular evolution as a whole finds the language of statistical 
physics to often be the most appropriate for describing how modern cells 
evolved 

In the remainder of this article, we briefly introduce the biology of hori- 
zontal gene transfer and describe some of the ways in which HGT plays an 
important role in biological systems. We then describe how HGT requires an 
extension of "classical" notions of evolution, and how this is being accom- 
plished by recent efforts to model the effect of HGT on genetic evolution. 
In order to predict the outcomes of HGT, concepts from statistical physics 
can be useful, for example in interpreting ecological data. In particular we 
consider a striking trend that has emerged from the sampling of environmen- 
tal DNA extracted from marine microbes, namely the variation in density 
of a particular type of mobile genetic element (transposons) with depth in 
the ocean. We argue that an intuitive notion of genome entropy, describing 
the variable regions of a genome (ie. not the core, conserved genes), leads to 
trends very similar to those observed. 

2 The extent of horizontal gene transfer 

Horizontal gene transfer, or HGT, is a ubiquitous feature of genome evolu- 
tion . For the purposes of this article, we consider an event to be a HGT if 
it involves any transfer or introduction of any genetic material that does not 
stem from cellular replication. Evidence for HGT is wide-ranging. It occurs 
within and between all domains of life @M31[5ni[M115] • The range of HGT 
encompasses the entire scale of organismal complexity, from viruses [H] to 
multicellular eukaryotes [50] - Time is apparently not a barrier: HGT events 
ranging from ancient |96| to very recent |42j have been reported. Indeed, 
there appears to be no absolute barrier to HGT, and we conclude that it is 
a generic feature of genome dynamics. 

Microbial genomes also contain and interact with a large variety of mobile 
genetic elements (MGEs) including plasmids [72], viruses [55], and trans- 
posons [55]. Counts of horizontally transferred genes identified from G+C 
content found them to make up anywhere from 1.5 to 14.5% of microbial 
genomes [31] . Characterizations of HGT in different gene families found that 
34% of all gene families were identified as having undergone HGT at some 
point in their evolutionary history [19] . 

Examples of genes transferred between evolutionarily distinct microbes 
include rhododopsins in marine bacteria and archaea [2T]l29]. These photo- 
synthesis genes have been linked not to a particular organism so much as 
a particular environment — so-called 'cosmopolitan genes' |29j . These genes 
appear to be readily transferred and incorporated into many different mi- 
crobes, and appears to aid them in harvesting light. The idea of a gene being 
adapted and more tenaciously linked to an environment than a particular 
microbe departs radically from the more classical notions of vertical decent 
with mutation. In such cases, we are taking implicitly a more gene-centric 
than organism-centric view of evolution. 
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While examples such as the photosynthetic rhododopsins depict a fairly 
harmless tale of environmentally adapted genes, the same evolutionary mech- 
anisms are also responsible for a medical disaster that has arisen due to the 
lack of appreciation for the potential swiftness of evolution. Antibiotic and 
drug resistance genes rapidly adapt to new hosts, meaning that once a single 
population of pathogens becomes resistant to treatment, potentially many 
more rapidly will acquire resistance soon thereafter [TOllll) . Moreover, once 
the genes for antibiotic or drug resistance develop in some pathogen they 
seem to persist over long periods of time in some form. Thus, while avoid- 
ing a particular antibiotic treatment for long periods of time has resulted 
in the loss of resistance, very rapid re-emergence of resistance occurs once 
the antibiotic is reintroduced [70] . Interestingly, HGT of antibiotic resistance 
genes from antibiotic-producing bacteria does not appear to have played the 
major role in the evolution of antibiotic resistance in the clinical setting [3]. 
Instead, most antibiotic resistance genes appear to have originated and diver- 
sified in other environmental bacteria. They were then disseminated widely, 
and these underlying genes formed the basis for the development of antibiotic 
resistance in pathogens and commensal bacteria [3]. Resistance that seems 
to have been developed in commensal microbes has made its way back to 
more open environments in soil and water [51] . 

Potential barriers to HGT appear to be abundant [82] ■ First, genes must 
be delivered into a microbe. This already presents a number of barriers. 
Viruses and plasmids have limited host ranges |27| . Naturally competent mi- 
crobes such as Neisseria are capable of uptaking raw DNA from their envi- 
ronment (a process known as transformation) , but uptake is limited to DNA 
containing certain sequence motifs [57JT5] . Genes which are not favorable are 
rapidly deleted and lost from a genome [71] . In order to be retained on longer 
timescales, acquired genes must pass through a gamut of hurdles. In order 
to benefit its host, the newly-introduced gene must somehow be expressed. 
Depending on the gene's history, it may now be subject to a new regulatory 
scheme. Even once expressed, the gene may not be expressed with the right 
timing or in the right amounts to benefit its new host. Moreover, the newly 
expressed product has not yet been adapted to the host environment, which 
may lead to a greater chance of unwanted or deleterious interactions with 
other proteins in the new cellular environment. Ultimately, the horizontally- 
transferred gene must bypass the many levels of regulation and positively 
impact the organism it has entered in a short time in order to be retained. 

These barriers appear to be more surmountable the more closely related 
are the donor and recipient, and as such, HGT is generally believed to occur 
more often between the same or similar species |27j . This tendency is ampli- 
fied by the fact that organisms of the same species are spatially-correlated 
also. However, overall there appear to be no absolute barriers to HGT despite 
the numerous barriers to any individual trial. Along with how many HGT 
events are retained in the long term, another pertinent question to raise is 
how frequently does HGT occur on a shorter timescale? By directly counting 
the number of times each plasmid horizontally transferred in a population of 
lab-grown Escherichia coli, Babic et al. were able to determine that plasmids 
were transferred approximately once per cell generation [6]. Thus, while the 
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barriers to individual HGT events persisting in a foreign host for very long 
might be low, the number of attempts are seemingly high. 

3 Detecting horizontal gene transfer 

Despite the extent of HGT, the evolutionary history of organismal lineages is 
preserved through the consistent phylogenies derived from a number of core 
subsystems including translation, transcription, and DNA replication |95 p i3 [ 
ITS] . These subsystems are highly-conserved and present in every cell. The 
consistent pattern of relationships within these major cellular subsystems 
defines microbial taxonomy. HGT can then be detected as gene relation- 
ships that differ greatly from this canonical organismal phylogeny. Horizon- 
tal movement of genetic material is detected by comparing the evolutionary 
relationships between genes in different species against their organismal rela- 
tionships. This can be done using a number of metrics including correlations 
in sequence distance [28], phylogeny [7[l54]. or gene composition [84|I20| . 

It is noteworthy that many of these methods rely on properly characteriz- 
ing the statistical distributions of gene properties [28H84PQ] . HGT detection 
methods generally rely on there being significant differences in organismal 
and gene pattern of descent. Limitations on resolution within the organismal 
phylogeny, or evolutionary relationships, makes it difficult to detect HGT be- 
tween closely related species. Thus, HGT is nearly undetectable between the 
organisms for which it is expected to occur the most frequently. Since we lack 
the resolution to distinguish organismal relationships between closely related 
species, we cannot track their HGTs through sequence analysis. Theoreti- 
cal models of the role of HGT in biology becomes all the more important 
for discerning both the extent and possible effects of HGT on organismal 
evolution. 

What sequence analysis cannot see in nature, fluorescence microscopy 
has managed to visualize in the laboratory. By tracking an Escherichia coli 
plasmid with a fluorescent marker, Babic et al. were able to directly visualize 
HGT [6 . While these studies do not tell us much about the general rate of 
HGT between species, they can measure the level of activity of particular 
MGEs, providing us with a general idea of what the limits of HGT might be. 

HGT may also be detectable indirectly through its influence on genome 
organization. Genomes are not unstructured chains of genes, but apparently 
possess an architecture that, particularly in the case of microbes, can as- 
sist in gene expression and genome evolution. One of the most frequently- 
encountered structures in biology is modularity: a complex network (e.g. 
metabolic or gene regulatory) that can be decomposed into independent (i.e. 
weakly-interacting) internally-connected functional parts that can evolve sep- 
arately with minimal disruption to the system as a wholei73 , 38 , 88 , 53 . Mod- 
ular networks can arise when the environment is fluctuating in time, creat- 
ing a modularly- varying potential for the dynamics [48], or more generally, 
selecting for the organizational structure that can change in the most facile 
manner [57]. For a similar reason, a spatially- heterogeneous environment, such 
as might arise after an extinction event, can also promote the emergence of 
modularity[49] as a suddenly vacated ecological niche becomes available for 
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colonization. One way for modularity to arise is through the horizontal trans- 
fer of a gene or collection of nearby genes that code for a particular part of a 
network|78,39,56 , thereby accelerating the dynamics. There is evidence that 
networks can indeed grow by acquiring genes in groups (known as operons, 
known to govern coupled reactions in the cell), and that these are attached 
preferentially at the edges of the existing net work [66], 

4 Modeling horizontal gene transfer in biology using statistical 
mechanics 

MGEs such as viruses outnumber microbes by over 10-to-l in environmental 
samples [79,90 — a fact suggestive of their larger role in microbial ecology [80] 
and evolution [3"3"ll30j . Assessing the role of these MGEs in the ecosystem, 
however, is difficult. Doing so by direct experimental measurements, such as 
sequencing, is essentially impossible. Even if one knew every environmental 
DNA sequence and biochemical reaction within an environment across time, 
the task of reconstructing the effect of HGT on the entire ecosystem would 
be akin to calculating the structural soundness of a skyscraper on the basis 
of the positions and properties of its elemental particles. 

The importance of studying a problem at an appropriate scale applies to 
biology as well as physics. The goal of ecology is to understand the principles 
underlying change and stability of populations. For example, the role of a 
microbial species is something of consequence, whereas the role of an indi- 
vidual microbe is not. Ecological modeling seeks to answer questions about 
the general nature of local processes that give rise to global behaviors or 
properties. Understanding how local and global processes are related can 
give us an outline of the forces driving the system. 

There are a number of examples in the literature of insight gained from 
coarse-graining biological systems, studied using statistical mechanics ideas 
ranging from spin glass models [T8"ll3"9"] to non-equilibrium statistical mechan- 
ics. An example of the latter is the phenomenon of speciation. For example, 
analysis of the codon usage of genes extracted from libraries of Escherichia 
coli strains indicate that speciation may have arisen as a result of HGT [SD] • 
A statistical mechanical model shows specifically how HGT can give rise to 
speciation — global genome sequence divergence — in a population of closely- 
related organisms by seeding the propagation of mutational fronts |86j . In 
this study, statistical mechanics was used to model the system-wide conse- 
quences of the interplay between point mutation and homologous recombina- 
tion, following a single HGT event in microbes. In asexual organisms, such as 
bacteria, homologous recombination allows genome sequences to be repaired 
and thus made more uniform in a population, while point mutations are a 
source of genome disorder. Species-specific biological details are important, 
because the successful insertion of a piece of alien DNA in a pre-existing 
bacterial genome relies on the ends of the insertion matching with both sur- 
rounding pre-existing base pairs. This requirement of matching ends is absent 
in some bacteria, or only enforced at one end in others. In addition, the cell 
has mechanisms to prevent the insertion of alien DNA, but these mecha- 
nisms become less effective the closer the alien DNA sequence matches the 
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region it replaces. The behavior of such a population of interacting genomes 
can be explored by Monte Carlo simulation, and as a function of the rates 
of point mutation and recombination, the phase diagram mapped out. In- 
terestingly there is a generically first-order phase transition between a state 
with a monodisperse population and a state with a diverse population. The 
transition occurs through the propagation of what have come to be known as 
diversification fronts propagating along the genome over the course of evolu- 
tionary time. The front propagation arises because around an insertion, the 
disruption of the canonical sequence means that recombination is locally sup- 
pressed, leading to the build-up of point mutations and the extension of the 
region of sequence divergence. Such fronts are predicted to occur in strains 
of Bacillus cereus but not in strains of Buchnera aphidocola, owing to the 
details of their mismatch repair mechanisms, and these predictions were con- 
firmed by comparative genomics studies on the fully-sequenced genomes of 
these organisms. This mechanism for speciation would leave behind a genome 
that has a mosaic structure, corresponding to the merging of several diversi- 
fication fronts arising from distinct horizontal gene transfer events. Indeed, 
such puzzling genome features have been observed in an environment where 
naively one would have expected extreme selection to have provided essen- 
tially no diversity [85 . The mechanism discussed here is especially interesting, 
being a counterexample to the popular notion that speciation arises purely 
as a result of Darwinian selection. 

Other features have been linked to HGT through modeling, including 
modularity [75], the optimality of the genetic code [57], and phase variation 
of biofilms [T5] (a microbial analogue of multicellular differentiation). Popu- 
lation and gene heterogeneity within an environment is counterbalanced in 
microbial systems by mechanisms such as homologous recombination that 
serve as an additional homogenizing force |92) . 

HGT impacts the safety of the biotechnology industry greatly. Monitoring 
and modeling the spread of genes such as virulence factors enhances public 
safety and helps the development of better lab practices 64J. In addition, 
MGEs can have impact beyond the movement of genes between organisms. 
In oceanic carbon cycling, viruses are thought to "kill the winner" since 
having the most available hosts should positively feedback into greater pre- 
dation |80j . This dynamic may play an important role in the diversification 
of ocean microbes by removing dominant species [91] and in the boom-bust 
cycles of planktonic blooms, some of which can extend for hundreds of square 
miles [72] . Both questions are important to biodiversity and ecological stabil- 
ity. Plankton blooms choke off oxygen and nutrients within a large portion 
of the ocean and may be the result of trace minerals or other man-made 
pollutants |5rJUTU] . The factors regulating these boom-bust cycles are a topic 
of current debate [51 1351151] . 

5 Horizontal gene transfer and genome entropy 

Quantitative modeling shows that the early benefit from HGT can explain 
certain general properties of biology including the emergence of a universal 
genetic code [87] and modularity [78] . Modern life involves a complex web of 
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enzymatic interactions bringing additional interaction and regulatory barri- 
ers to HGT. However, examples of recent beneficial HGT are abundant and 
reveal that HGT is still occurring in modern organisms. HGT serves as an 
effective means for modulating mutation rate within an organism. Modeling 
shows that mutation rate is indeed selectable [53] . Examples of organisms al- 
tering their mutation rates include the SOS response in Esherichia coli [59] 
and the competence response of Bacillus subtilus [ID] . 

HGT accelerates organismal evolution by allowing for the exchange of 
genes between two organisms, populations, or species. HGT modifies the 
types of genetic changes and enhances the amount of total genetic mutation. 
In that sense, HGT can be thought of as a source of disorder in a genome, 
in effect raising its information entropy [T]; thus, we will informally use the 
notion of a genome "temperature" to represent the level of disorder in a 
genome. Analogies between HGT and temperature have been made in the 
context of studies of the evolution of biological complexity using digital or- 
ganisms [5], and also in the evolution of cells and genomic annealing 93J. 
Genomic entropy roughly correlates to the amount of genomic change per 
unit time. This directly affects both the rate of information loss and organis- 
mal adaptation. Genomic annealing refers to the concept that early forms of 
life had few barriers and much to gain from HGT. This resulted in massive 
HGT that slowed down or became quenched as barriers arose due in part to 
the increasing complexity of cellular life. 

At first glance, these two aspects of genome "temperature" appear to be 
quite different. However, they can be understood as different sides of the same 
coin. Genomic entropy is a property that reflects environmental information. 
For success, changes to the environment must be met with organismal adap- 
tation. In other words, the nature of genome plasticity must be reflective 
of fluctuations in the environment. At the same time, not all aspects of the 
environment are in constant flux. Ideally a genome would keep well-adapted 
genomic elements constant and only change those that need to change. This is 
the concept that leads to genomic annealing, whereby barriers to entropy (in 
the form of HGT) arise from increasing complexity. Here we will not attempt 
to address issues of how to define complexity; instead, we will attempt to 
see if the concept of genome entropy in the information sense can be helpful 
phenomenologically. 

Competition and changing environments dictate that organisms that can 
quickly adapt to these ever changing circumstances will hold an advantage 
over their neighbors. It seems natural that a readily available adaptive mech- 
anism such as HGT would be utilized for exactly these reasons. 

While the frequency of HGT in environmental microbes is inaccessi- 
ble through direct measures, the importance of maintaining evolutionary 
"temperature" can be inferred from metagenomic surveys of the ocean |52| . 
Kostantinidis et al sequenced over 200 Mbp of a random whole-genome shot- 
gun (WGS) library obtained from a depth of 4,000 m at Station ALOHA in 
the Pacific Ocean. The per-bp density of a typical type of transposon known 
as an insertion sequence (IS), which can be identified by genes that codes for 
Transposase, was measured and compared to that of other available WGS 
sequence data from ocean water at various depths. Overall, they found trans- 
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poson density increases with ocean depth and proposed a relaxation of purify- 
ing selection at deeper ocean depths allows the proliferation of these 'selfish' 
gene elements [52]. In other words, the transposons are simply unchecked 
by negative (purifying) selection. Although they are viewed as deleterious, 
they are allowed to grow in number due to the lack of competition between 
organisms. However, nutrients are scarce in the deep ocean and energy is at 
a premium: in such situations, organisms tend toward more efficient genomes 
rather than allow them to become disordered. In fact, an explanation based 
on the assumed role of purifying selection highlights another interesting and 
important problem in marine microbial ecology: what is the source of genetic 
diversity in microbial populations? Naively, one would expect that in the 
narrowly-defined environmental niche of the ocean where there is a limited 
supply of nutrient and light, the number of coexisting species, in equilibrium, 
cannot be greater than the number of limiting constraints [46 . Observations 
are in sharp disagreement with this selection-based idea, and the contra- 
diction has been a vigorous source of debate in the biological literature. 
Amongst possible proposed solutions to the paradox are spatial and tempo- 
ral environmental variability [71] and phage predation |69|l9] . but a satisfying 
quantitative explanation is still lacking. 

We propose here an alternative explanation for the apparent increase in 
transposons with ocean depth, one tied to the notion of evolutionary tem- 
perature. HGT provides a means of rapid evolution, both increasing the 
overall mutation rate and transferring functional genetic material between 
organisms. Examples of viruses encoding important functional genes in the 
ocean include cyanophage [77] and the photosystem II core reaction [55] , 
As prevalent as this may seem in the Epipelagic zone of the ocean, microbial 
population densities decrease inversely with ocean depth soon thereafter j52] . 
With reduced average density, the opportunities for HGT also decrease. On 
the other hand, transposons shuffle genetic material within a cell, and this 
serves to enhance organismal adaptability. One might anticipate, therefore, 
that to maintain a collection of interacting genomes at the same temper- 
ature, or equivalently, at the same level of disorder, contributions from all 
varieties of MGE need to be included. In particular, as the HGT rate goes 
down, the density of other MGEs, principally transposons would be expected 
to increase to compensate. 

This argument can be translated into a rough scaling argument, as fol- 
lows. The number density p of cells is a decreasing function of depth, falling 
off roughly as p ~ dr 1 where d is the depth in the ocean, according to the 
data [52], The probability of HGT events per base pair, either conjugal or 
mediated by an intermediary, Phgt ~ p 2 , assuming some sort of law of mass 
action. On the other hand, transposons shuffle genes within a cell's genome, 
and this process actually requires two IS elements in order for the tranpo- 
sition event to take place along the genome, so that the probability of each 
tranposition event is proportional to the square of the number density of IS 
elements, pis- Making the assumption that there is a uniform genome entropy 
with depth, and that the HGT and transposition events are independent, we 
obtain that 



Phgt x Pis ~ constant < 1. 



(1) 
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Genome View 




Fig. 1 Schematic of processes present in simulation 



Making the usual sort of mean field approximation, we then obtain that 

P x Pis ~ constant (2) 

and thus that pis ~ d, a result in rough agreement with the available data. 

In order to test this idea, we simulated populations of microbes under 
continuous selection pressure for a fixed target sequence at different popula- 
tion densities. A schematic of the the processes that impact our simulations 
is given by Fig. [T] For initial conditions, we take genomes of length 300 whose 
letters are selected randomly from an alphabet size of 20. These genomes are 
each initialized randomly and a variable number N of them are placed into 
an array of size 10000. The population in each subset then remains fixed, but 
competition is allowed whereby a fitter organism may replace a less fit or- 
ganism. Point mutation, deletion, HGT, and transposons are allowed. Point 
mutation randomly exchanges one letter of the genome for another randomly 
assigned letter while deletion removes the letter entirely. HGT randomly se- 
lects a segment 10 letters long from one genome and inserts it into another. 
The model for transposon behavior is based on Insertion Sequence (IS) el- 
ement behavior. In this model, an IS is represented by a specific 2 letter 
combination. As shown in Fig [2j an IS can non- conservatively insert (i.e., 
copy-paste as opposed to cut-paste) itself elsewhere within a genome. When 
2 IS elements are within a fixed distance of each other (20 in this simulation), 
then they either transpose the entire length of genome in between including 
themselves or the region between them is deleted through homologous recom- 



11 



Qs) [ gene | fis] [ gene | ps] [ gene 
I I 



T 



Deletion - i.e., Homologous 
Recombination, Mismatch Repair 



fisl [ gene | fl?] | gene 
\ / 



Insertion - i.e., Composite Transposon 




I ' ' 1 

fisl [ gene | fis] | gene | fisl [ gene | (jS 



Fig. 2 Schematic of how insertion sequences (IS) act to transpose and delete ge- 
netic material. 



bination, each event occurring with equal probability. The rates of mutation, 
deletion, and transposition are then fixed at 0.001, 3.0, and 0.5 per generation 
per organism. The rate of HGT is fixed at 1.0 per opportunity. In order for 
HGT or competition (whereby one organism overwrites another) to occur, 
two slots in the population array are selected randomly. If both slots contain 
a living organism, then HGT occurs according to the HGT rate. In this way, 
as population decrease, so does the number of HGT opportunities. The pa- 
rameters were chosen with an eye toward obtaining the correct relative rates 
for each of the processes, as mimicking the orders of magnitude difference be- 
tween mutation rate and HGT (10 9 ) becomes impractical computationally. 
We examined mutation rate and system size scaling in for a related system 
and found that our results were qualitatively robust |17j . 

Genes within the genome are identified as a gapless subset of the genome 
that shares a common subsequence with the target sequence, which is of 
length 10. The organismal fitness is then given by 

n 

F = min(^ g it 1) - n[i (3) 

i=0 

where gi is the length of the zth gene (note that, here, length relates directly 
to the number of consecutive letters that match the target sequence), n is 
the total number of genes within a genome, and /i is a penalty of 0.2 for each 
gene copy present in the genome. This fitness rewards genomes containing 
subsequences matching a particular target sequence. To some extent, there 
can be a fitness gain from multiple copies of a gene, but this gain is not 
limitless and many short matches to the target sequence are penalized. The 
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— simulation results 




Transposon Density 

Fig. 3 Transposon density versus population density. We simulated the adaptation 
of organisms toward a fixed target sequence allowing point mutation, deletion, 
HGT, and transposon dynamics for 20,000 generations and plot the averaged final 
transposon densities of 100 simulations for each point. Different population densities 
have differing amounts of opportunities for HGT. Our simulation results show that 
at lower population densities with decreased HGT transposons increase in relative 
abundance. This is consistent with the idea that transposons serve as a substitute 
for the evolutionary dynamic provided by HGT in shallower waters. In ocean waters, 
population density is inversely proportional to depth below the Epipelagic Zone 
(>200 m), to first order approximation |52| . 

fitness function described here has a basis in a recently proposed model for 
the evolution of novel gene function [T2]. A similar fitness function was also 
at the basis of related theoretical work on the transposon dynamics of recent 
obligate associations [17]. 

Fig. [3] plots the results from our simulation of microbial competition ac- 
cording to the rules outlined above across a population density gradient. This 
tests the hypothesis that the apparent increase in transposon density is re- 
lated to decreasing levels of HGT due to the decreasing population densities 
in deeper waters. As shown, we do indeed see an increase in the transposon 
density that corresponds with decreasing population density. However, in or- 
der to assert that this is due to the lack of HGT and not other effects such as 
population size or competition timescales (which have also slowed down due 
to competition requiring two organisms to interact), we must isolate these 
other effects and focus on HGT alone. 

Fig. [4] shows what happens when we allow a fixed population size to 
adapt under different HGT rates. The trend of increasing transposon density 
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Fig. 4 Transposon density versus HGT rate for fixed population size. Keeping 
population size at the fixed value of 1,000, we plot the transposon density after 
10,000 generations averaged across 100 simulations per point. The deletion rate 
is 2 per generation per organism. All other parameters are the same as described 
in the main text. In order to verify that the increase in transposon density seen 
in Fig. [3] is the result of an HGT:transposon evolutionary tradeoff, we eliminate 
other sources of variation by fixing the population size and varying the HGT rate. 
This serves to eliminate the effects of differing population sizes and competition 
timescales present in the previous simulation. In this simulation, we show increases 
in transposon density with decreasing HGT rate. We plot a power law line with an 
exponent of 2 for reference. The line does not represent a fit to the data. 



holds for decreasing HGT rate. This is consistent with our hypothesis that 
the transposon density trend seen in the ocean corresponds to lower HGT 
rates in the ocean depths. Reduction of HGT leads to increased transposon 
numbers as originally postulated. 

The possibility that some notion of 'adaptability' underlies transposon 
dynamics points toward two very interesting ideas. The first is that trans- 
posons, as well as other mobile genetic elements such as viruses and plasmids, 
could be more than just parasitic gene sequences feeding off host genomes. 
Instead, they may be required for the long term survival, adaptability, and 
diversification of an organismal lineage. The second is that evolutionary dy- 
namics such as transposon proliferation may be driven by generic processes 
rather than governed by the specific histories of individual populations. This 
means that the properties of biological organism can and should be under- 
stood from the viewpoint of the statistics of a physical process rather than 
the particulars of a historical accident. 
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6 Summary 

HGT couples together unrelated organismal lineages in a way that requires us 
to rethink the classical point-mutation based notions of population genetics. 
HGT has been shown to be a prevalent force in microbial evolution espe- 
cially. The impact of HGT on gene and organismal evolution has not been 
fully understood. However, notions from statistical physics such as temper- 
ature and annealing have been applied to evolutionary dynamics before and 
will no doubt continue to play a role in determining the underlying rules of 
evolution. We showed one example of how theory can play a role by offering 
up a quantitative hypothesis on the role of HGT in determining how trans- 
poson densities vary with ocean depth. Our example involves very generic 
interactions and does not depend on the particular environment or microbial 
species. It is exactly these non-specific types of studies that are important if 
we are to unify our understanding of the dynamics of genome evolution. 
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